Dyr og Mennesker

Piglet mortality

Gavin Simpson

Aarhus University

2025-12-03

Piglet mortality

Piglet mortality

Is the mortality rate of piglets related to the increasing size of litters?

Photo by Veronica White on Unsplash

Photo by Alan Roberts on Unsplash

Data set

# load data
piglet <- read_tsv("data/Teaching_data_Dyr&Mennesker_sow_data.txt") |>
  janitor::clean_names() |>
  rename(
    stillborn = stilborn,
    field = group,
    avg_birthw_kg = litter_mean_birth_weight,
    n_teats = number_functional_teats,
    avg_weight_d10_kg = litter_mean10day_weight,
    total_birthw_kg = litter_total_birth_weight,
    total_weight_d10_kg = litter_total10day_weight
  ) |>
  mutate(
    field = factor(field),
    parity = factor(parity),
    avg_birthw_kg = avg_birthw_kg / 1000
  ) |>
  select(
    -c(sow_breed, litter_breed)
  )

Data set

# A tibble: 215 × 13
       id field parity n_teats liveborn stillborn totalborn live_day4 live_day11
    <dbl> <fct> <fct>    <dbl>    <dbl>     <dbl>     <dbl>     <dbl>      <dbl>
 1 1.42e8 50    1           18       16         0        16        15         15
 2 1.39e8 50    3           16       11         0        11        11         11
 3 1.36e8 50    5           16       16         1        17        12         11
 4 8.00e8 50    5           16       14         0        14        13         12
 5 1.36e8 50    5           17       20         1        21        18         18
 6 1.39e8 50    3           17       17         1        18        16         16
 7 1.36e8 50    5           16       14         0        14        10          9
 8 1.42e8 50    1           15       14         0        14        12         12
 9 1.41e8 50    2           16       14         1        15        13         13
10 1.39e8 50    5           15       16         0        16        16         15
# ℹ 205 more rows
# ℹ 4 more variables: avg_birthw_kg <dbl>, avg_weight_d10_kg <dbl>,
#   total_birthw_kg <dbl>, total_weight_d10_kg <dbl>

Data set

In this walkthrough example, I’ll consider the stillborn variable and investigate the roll of litter size totalborn in the number of still born piglets, controlling for parity and field

  • parity is the number of times the sow has become pregnant and given birth to a litter,

  • field is an indicator of which sows were housed in the same field

Generalised Linear Models

Generalised Linear Models

Generalised Linear Models (GLMs) are a powerful extension to the linear regression model, extending the types of data & conditional distributions that can be modelled beyond the normal or Gaussian distribution of linear regression

  • Binary (dichotomous) variables can be modelled using logistic regression

  • Count data can be modelled using Poisson regression

  • Milk yield, animal weight, or other positive, continuous variables can be modelled using a gamma regression

All are special cases of the broad class of GLMs

Also includes linear regression as a special case

Generalised Linear Models

GLMs allow the conditional distribution of the response to be any distribution from the exponential family; Poisson, binomial, Gaussian, gamma, multinomial, …

There are three parts to a GLM

  • The conditional distribution of \(y\)
  • The linear predictor \(\eta\), and
  • The link function

Whilst this affords a wealth of options, often natural choices for the conditional distribution of \(y\) and the link function arise from type of data being modelled

\[\begin{align*} y_i &\sim \textcolor{red}{\text{EF}}(\mu_i, \phi) \\ \mathbb{E}(y_i) &= \mu_i \\ \textcolor{red}{g}(\mu_i) &= \textcolor{red}{\eta_i} = \beta_0 + \beta_1 x_{1i} + \beta_2 x_{2i} + \cdots \beta_j x_{ji} \end{align*}\]

GLM: Conditional distribution

In a GLM we want a model for the expectation of \(y\), \(\mathbb{E}(y_i)\), which here will usually be the mean of the response, \(\mu_i\)

We might model \(\mu_i\) as following a Poisson distribution if the data were counts, or as a binomial distribution in the case of 0/1 data, as we did the in the high SCC example

GLM: Linear predictor

We need to decide which predictor variables and any transformations of them should be used to predict \(y\); this is the linear predictor, \(\eta\)

\[\eta_i = \beta_0 + \beta_{i1} x_{i1} + \beta_{i2} x_{i2} + \cdots + \beta_{ij}\]

However, was we saw in the Gaussian linear model, there is nothing stopping the above equation returning values that don’t make sense for the variable we are modelling

Plot the data

Plot the data

What model should we use?

We want to model the number of stillborn piglets

Discuss amongst yourselves what kind of GLM we might fit to these data

  • What conditional distribution for stillborn?

  • What types of data are the predictor variables:

    1. totalborn
    2. parity
    3. field
05:00

Binomial counts

Model is for counts from a total (litter size) and the response is the proportion stillborn

m_still <- glm(
  cbind(stillborn, liveborn) ~ field + parity + totalborn,
  family = binomial(link = "logit"),
  data = piglet
)

Poisson counts

Model is for a count and the response is the rate of stillborn piglets

m_still_p <- glm(
  stillborn ~ field + parity + totalborn,
  family = poisson(link = "log"),
  data = piglet
)

Models are very similar

Models are very similar

m_still |> summary()

Call:
glm(formula = cbind(stillborn, liveborn) ~ field + parity + totalborn, 
    family = binomial(link = "logit"), data = piglet)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -7.76015    0.82889  -9.362  < 2e-16 ***
field51      0.63672    0.62610   1.017   0.3092    
field52      1.16649    0.60488   1.928   0.0538 .  
field53      1.14083    0.56758   2.010   0.0444 *  
field54      0.81162    0.57156   1.420   0.1556    
field55      0.03616    0.64790   0.056   0.9555    
field56      0.37832    0.58910   0.642   0.5207    
field57     -0.75980    0.87991  -0.863   0.3879    
field58     -0.34511    0.72463  -0.476   0.6339    
field59     -1.14145    1.14346  -0.998   0.3182    
field60      0.29533    0.68365   0.432   0.6658    
parity2      0.34851    0.38488   0.905   0.3652    
parity3      0.12777    0.34403   0.371   0.7104    
parity4      0.40732    0.49322   0.826   0.4089    
parity5      0.08041    0.35790   0.225   0.8222    
totalborn    0.22058    0.04107   5.371 7.81e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 276.94  on 214  degrees of freedom
Residual deviance: 200.29  on 199  degrees of freedom
AIC: 353.43

Number of Fisher Scoring iterations: 6

Models are very similar

m_still_p |> summary()

Call:
glm(formula = stillborn ~ field + parity + totalborn, family = poisson(link = "log"), 
    data = piglet)

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -5.83141    0.80588  -7.236 4.62e-13 ***
field51      0.61790    0.61805   1.000   0.3174    
field52      1.12261    0.59386   1.890   0.0587 .  
field53      1.06945    0.55755   1.918   0.0551 .  
field54      0.77475    0.56302   1.376   0.1688    
field55      0.02706    0.63901   0.042   0.9662    
field56      0.35612    0.58082   0.613   0.5398    
field57     -0.77807    0.86945  -0.895   0.3708    
field58     -0.33112    0.71682  -0.462   0.6441    
field59     -1.13333    1.13668  -0.997   0.3187    
field60      0.27017    0.67303   0.401   0.6881    
parity2      0.33288    0.37676   0.884   0.3770    
parity3      0.10419    0.33743   0.309   0.7575    
parity4      0.38498    0.48233   0.798   0.4248    
parity5      0.06896    0.35116   0.196   0.8443    
totalborn    0.27247    0.03949   6.899 5.23e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 298.73  on 214  degrees of freedom
Residual deviance: 191.84  on 199  degrees of freedom
AIC: 350.83

Number of Fisher Scoring iterations: 6

Model diagnostics

m_still_p |> gglm() # looks bad, but...

Model diagnostics

library("DHARMa")
pois_res <- m_still_p |> DHARMa::simulateResiduals()

plot(pois_res)

Mortality

Mortality

Part of the project asks you to look at mortality at day 4 and day 11

The number of dead piglets is not stored in the data

We need to compute it

piglet <- piglet |>
  mutate(
    n_dead_day4 = liveborn - live_day4,
    n_dead_daya11 = liveborn - live_day11
  )

Over dispersion

Rangstrup-Christensen et al (2018) used a Negative binomial model

The Negative binomial distribution is also for a count response

It allows for data that are overdispersed relative to the Poisson

Overdispersion means that the data show more variance (spread) than expected

In the Poisson the mean is also the variance of the distribution: \(\text{var}(Y) = \mu = \lambda\)

In the Negative binomial, the variance is \(\text{var}(Y) = \mu \times \mu^2 / \theta\)

Time at risk

Rangstrup-Christensen et al (2018) also used live time at risk

This was used in place of litter size to normalize the mortality rate across litter sizes

You can think of this variable as piglet time at risk

They did this because the time at risk (days until castration) and the number of live born piglets varied per sow and time of death was not available

Average birth weight

Average birth weight

An additional question asks you to look at the relationship between litter size and piglet weight at birth

We don’t have the weight of each piglet, but we do have the average weight of the litter

The response is avg_birthw_kg in kilograms

What type of data is this? What GLM could you fit?

05:00

Random effects

Random effects

Several of the papers use random effects

I included field as a fixed effect — we interpret this as us being interested in those specific fields

The model can only predict for those fields

In reality the pigs could have been in any fields — the specific field isn’t important

random effects allow us to treat field as a random factor

Also allows prediction at the average field (i.e. ignoring the contribution of field)

We’ll cover random effects in an elective course in year 3 if you are doing a research project

Extras

Comparisons

Consider the Poisson model for number of stillborn

m_still_p <- glm(
  stillborn ~ field + parity + totalborn,
  family = poisson(link = "log"),
  data = piglet
)

Are there differences between parities?

Answering this requires us to us the model to do a comparison

Comparisons

The question

Are there differences between parities?

is an incomplete question; the size of an difference between parities depends on totalborn (litter size)

We need to

  • condition on totalborn to answer that question

  • or average the difference over a range of values for totalborn (litter size)

Why?

  • The model is non-linear — through the link function

  • Effects are multiplicative on the response scale

Comparisons I — averaging

  • Replicate data & modify so each sow “experiences” each parity
  • Predict \(\mathbb{E}(\mathtt{stillborn})\) for replicated data
  • Comparison are sequential; 1–2, 2–3, etc
  • Average comparisons over all sows
m_still_p |>
  avg_comparisons(
    variables = list(parity = "sequential")
  )

 Contrast Estimate Std. Error      z Pr(>|z|)   S  2.5 % 97.5 %
    2 - 1    0.154      0.179  0.860    0.390 1.4 -0.196  0.504
    3 - 2   -0.111      0.171 -0.649    0.517 1.0 -0.446  0.224
    4 - 3    0.140      0.239  0.586    0.558 0.8 -0.328  0.608
    5 - 4   -0.155      0.243 -0.637    0.524 0.9 -0.631  0.322

Term: parity
Type: response

Comparisons II — conditioning

  • Condition on one or more values of totalborn
  • Comparison are sequential; 1–2, 2–3, etc
m_still_p |>
  comparisons(
    variables = list(parity = "sequential"),
    newdata = datagrid(totalborn = c(15, 20))
  )

 Contrast totalborn Estimate Std. Error      z Pr(>|z|)   S  2.5 % 97.5 %
    2 - 1        15   0.0985      0.118  0.835    0.404 1.3 -0.133  0.330
    2 - 1        20   0.3848      0.452  0.852    0.394 1.3 -0.500  1.270
    3 - 2        15  -0.0711      0.109 -0.653    0.514 1.0 -0.285  0.142
    3 - 2        20  -0.2779      0.436 -0.637    0.524 0.9 -1.132  0.577
    4 - 3        15   0.0898      0.152  0.592    0.554 0.9 -0.207  0.387
    4 - 3        20   0.3505      0.591  0.593    0.553 0.9 -0.807  1.508
    5 - 4        15  -0.0993      0.155 -0.642    0.521 0.9 -0.403  0.204
    5 - 4        20  -0.3880      0.600 -0.647    0.518 0.9 -1.564  0.788

Term: parity
Type: response

Comparisons II — conditioning

  • Condition on many values of totalborn
m_still_p |> 
  plot_comparisons(
    variables = list(parity = "sequential"), # which variable to compare
    condition = "totalborn"                  # which variable to condition on
  )

Interaction

However, the model only encodes an additive effect of parity: add an interaction effect

m_still_p_int <- glm(
  stillborn ~ field + parity + totalborn + parity:totalborn,
  family = poisson(link = "log"),
  data = piglet
)

Comparisons I — averaging

  • Replicate data & modify so each sow “experiences” each parity
  • Predict \(\mathbb{E}(\mathtt{stillborn})\) for replicated data
  • Comparison are sequential; 1–2, 2–3, etc
  • Average comparisons over all sows
m_still_p_int |>
  avg_comparisons(
    variables = list(parity = "sequential")
  )

 Contrast Estimate Std. Error      z Pr(>|z|)   S  2.5 % 97.5 %
    2 - 1  -0.0594      0.308 -0.193    0.847 0.2 -0.663  0.545
    3 - 2  -0.0929      0.161 -0.577    0.564 0.8 -0.408  0.222
    4 - 3   0.1112      0.227  0.490    0.624 0.7 -0.334  0.556
    5 - 4  -0.1087      0.231 -0.471    0.638 0.6 -0.561  0.344

Term: parity
Type: response

Comparisons II — conditioning

  • Condition on one or more values of totalborn
  • Comparison are sequential; 1–2, 2–3, etc
m_still_p_int |>
  comparisons(
    variables = list(parity = "sequential"),
    newdata = datagrid(totalborn = c(10, 15))
  )

 Contrast totalborn Estimate Std. Error       z Pr(>|z|)   S   2.5 % 97.5 %
    2 - 1        10  0.32466     0.2283  1.4222    0.155 2.7 -0.1228 0.7721
    2 - 1        15  0.23894     0.1588  1.5046    0.132 2.9 -0.0723 0.5502
    3 - 2        10 -0.30853     0.2247 -1.3729    0.170 2.6 -0.7490 0.1319
    3 - 2        15 -0.24056     0.1629 -1.4763    0.140 2.8 -0.5599 0.0788
    4 - 3        10 -0.00508     0.0659 -0.0770    0.939 0.1 -0.1342 0.1241
    4 - 3        15  0.02309     0.1964  0.1176    0.906 0.1 -0.3618 0.4080
    5 - 4        10  0.03027     0.0738  0.4103    0.682 0.6 -0.1143 0.1749
    5 - 4        15  0.01822     0.2010  0.0906    0.928 0.1 -0.3757 0.4121

Term: parity
Type: response

Comparisons II — conditioning

  • Condition on many values of totalborn
m_still_p_int |> 
  plot_comparisons(
    variables = list(parity = "sequential"), # which variable to compare
    condition = "totalborn"                  # which variable to condition on
  )

Comparison III — litter size

  • Main question is in terms of litter size (totalborn)
  • Replicate data and set litter size = 10 and 15
  • Predict, compare predictions at both litter sizes, average, test
  • This averages over parity
m_still_p_int |>
  avg_comparisons(
    variables = list(totalborn = c(10, 15)),
    comparison = "difference"
  )

 Estimate Std. Error    z Pr(>|z|)    S 2.5 % 97.5 %
    0.197     0.0332 5.93   <0.001 28.3 0.132  0.262

Term: totalborn
Type: response
Comparison: 15 - 10

Comparison III — litter size by parity

  • As previous, but don’t average over parity
  • Condition on parity
m_still_p_int |>
  avg_comparisons(
    variables = list(totalborn = c(10, 15)),
    by = "parity",
    comparison = "difference"
  )

 parity Estimate Std. Error    z Pr(>|z|)    S   2.5 % 97.5 %
      1    0.213     0.0538 3.97   <0.001 13.7  0.1079  0.319
      2    0.109     0.1497 0.73   0.4651  1.1 -0.1840  0.403
      3    0.218     0.0559 3.91   <0.001 13.4  0.1089  0.328
      4    0.162     0.0945 1.71   0.0868  3.5 -0.0234  0.347
      5    0.217     0.0499 4.35   <0.001 16.2  0.1195  0.315

Term: totalborn
Type: response
Comparison: 15 - 10